This is an R Markdown Notebook.
The American National Election Studies (ANES) are surveys of voters in the U.S. on the national scale. For each presidential election since 1948, ANES collects responses from respondents both before and after the election. The goal of ANES is to understand political behaviors using systematic surveys. ANES’ data and results have been routinely used by news outlets, election campaigns and political researchers.
library(usethis)
library(tidyverse)
## -- Attaching packages ------------------------------------------------------------------------------------------ tidyverse 1.3.0 --
## v ggplot2 3.3.2 v purrr 0.3.4
## v tibble 3.0.3 v dplyr 1.0.2
## v tidyr 1.1.2 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.5.0
## -- Conflicts --------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(haven)
library(devtools)
library(RColorBrewer)
library(DT)
library(ggplot2)
library(data.table)
##
## Attaching package: 'data.table'
## The following objects are masked from 'package:dplyr':
##
## between, first, last
## The following object is masked from 'package:purrr':
##
## transpose
library(stringr)
anes_ts_cdf <- read_sav("D:/Columbia University/GR 5243/Project 1/Fall2020-Project1-zlj-0131/data/anes_timeseries_cdf.sav")
# Take a look at the data sets
head(anes_ts_cdf)
For this project, I would start from the question of is there a relationship between voters’ religious background and the party they would like to vote. I would concentrate on the two major parties, Democrat and republican, because they are always the candidates of the President election.
Data process for Question 1 (Religious background - Vote Party):
Election_years = as.character(seq(1952, 2016, 4))
Q1_data <- anes_ts_cdf %>%
mutate(
year = as_factor(VCF0004),
vote = as_factor(VCF0704),
religion = as_factor(VCF0128)
) %>%
filter(year %in% Election_years) %>%
filter(str_detect(vote, c("Democrat", "Republican")))
data.table(Q1_data %>%
select(year, vote, religion) %>%
sample_n(30))
Q1_data %>% select(year, vote, religion)
Q1_data <- Q1_data %>% select(year, vote, religion)
save(Q1_data, file = "../output/data_use.RData")
I am also interested in the relationship between Voters’ education background and the party they would like to vote.
Data process for Question 2 (Education Background - Vote Party):
Q2_data <- anes_ts_cdf %>%
mutate(
year = as_factor(VCF0004),
vote = as_factor(VCF0704),
education = as_factor(VCF0110)
) %>%
filter(year %in% Election_years) %>%
filter(str_detect(vote, c("Democrat", "Republican")))
data.table(Q2_data %>%
select(year, vote, education) %>%
sample_n(30))
Q2_data %>% select(year, vote, education)
Q2_data <- Q2_data %>% select(year, vote, education)
save(Q2_data, file = "../output/data_use.RData")
I am also interested in the relationship between Voters’ occupation and the party they would like to vote.
Data process for Question 3 (Occupations - Vote Party):
Q3_data <- anes_ts_cdf %>%
mutate(
year = as_factor(VCF0004),
vote = as_factor(VCF0704),
occupation = as_factor(VCF0115)
) %>%
filter(year %in% Election_years) %>%
filter(str_detect(vote, c("Democrat", "Republican")))
data.table(Q3_data %>%
select(year, vote, occupation) %>%
sample_n(30))
Q3_data %>% select(year, vote, occupation)
Q3_data <- Q3_data %>% select(year, vote, occupation)
save(Q3_data, file = "../output/data_use.RData")
Data process for new derivative question:
newQ_data <- anes_ts_cdf %>%
mutate(
year = as_factor(VCF0004),
vote = as_factor(VCF0704),
education = as_factor(VCF0110),
religion = as_factor(VCF0128),
occupation = as_factor(VCF0115)
) %>%
filter(year %in% Election_years) %>%
filter(str_detect(vote, c("Democrat", "Republican")))
data.table(newQ_data %>%
select(year, vote, education, religion, occupation) %>%
sample_n(30))
newQ_data %>% select(year, vote, education, religion, occupation)
newQ_data <- newQ_data %>% select(year, vote, education, religion, occupation)
save(newQ_data, file = "../output/data_use.RData")
load(file = "../output/data_use.RData")
Q1_plotdata <- Q1_data %>%
filter(!(str_detect(religion, "NA"))) %>%
group_by(year, religion) %>%
count(vote) %>%
group_by(year, religion) %>%
mutate(
prop = n/sum(n)
)
ggplot(Q1_plotdata,
aes(x = year, y = prop, fill = vote)) +
geom_bar(stat = "identity") +
scale_fill_manual(values=c("steelblue2", "red3")) +
facet_wrap(~ religion, ncol = 1) +
theme_classic() +
labs(
title = "Relationship between votes' religious backgroup with their choice of party voting",
y = "proportion"
)
By ignoring the fourth category of other and none, we focus on the three major regions in States. It is hard to find a trend in Jewish, but we could see in most of the time, the majority of voters who are Jewish are with the Democrat. Also, we would find a clearly upward trend of the proportions of voters who are Protestant or Catholic voting to Republican are increasing during the most three recent elections. Let us take a closer look of the line charts of the votes.
ggplot(Q1_plotdata %>%
filter(vote == "2. Republican") %>%
filter(year %in% c(2008, 2012, 2016)) %>%
filter(religion %in% c("1. Protestant", "2. Catholic [Roman Catholic]")),
aes(x = year, y = prop, group = religion, color = religion)) +
geom_point() +
geom_line() +
theme_classic() +
labs(
title = "Trends of voters who are Protestant or Catholic voting to Republican ",
y = "proportion of votes to Republican"
)
Let us put this interesting finding aside for a moment, because I am also interested in if there exist any relationship between voters’ education background and the party they support.
Q2_plotdata <- Q2_data %>%
filter(!(str_detect(education, "NA"))) %>%
group_by(year, education) %>%
count(vote) %>%
group_by(year, education) %>%
mutate(
prop = n/sum(n)
)
ggplot(Q2_plotdata,
aes(x = year, y = prop, fill = vote)) +
geom_bar(stat = "identity") +
scale_fill_manual(values=c("steelblue2", "red3")) +
facet_wrap(~ education, ncol = 1) +
theme_classic() +
labs(
title = "Relationship between votes' education background with their choice of party voting",
y = "proportion"
)
By viewing the bar chart above, we cannot find a significant trend for education level under college or advanced degree. However, there is a obvious trend for the college or advanced degree, and this group of people are more and more likely to vote for the Democrat. Also, let us take a close look at the vote result in the most three recent elections.
ggplot(Q2_plotdata %>%
filter(vote == "2. Republican") %>%
filter(year %in% c(2008, 2012, 2016)),
aes(x = year, y = prop, group = education, color = education)) +
geom_point() +
geom_line() +
theme_classic() +
labs(
title = "Trends of voters with different education level voting to Republican ",
y = "proportion of votes to Republican"
)
According to the line plot above, we can clearly see a upward trend of the proportion of voters voting to republican, who actually have weaker education background.
The similar upward trends of the support rating for Republican from people with weaker education background and people who are Protestant or Catholic make me wondering if there exist any correlation between these two group of people.
newQ_plotdata <- newQ_data %>%
filter(!(str_detect(education, "NA"))) %>%
filter(!(str_detect(religion, "NA"))) %>%
filter(!(str_detect(occupation, "NA"))) %>%
group_by(education) %>%
count(religion) %>%
group_by(education) %>%
mutate(
prop = n/sum(n)
)
ggplot(newQ_plotdata,
aes(x = education, y = prop, fill = religion)) +
geom_bar(stat = "identity") +
scale_x_discrete(labels = c("0-8 grades", "High school", "Some College", "College or higher"),
guide = guide_axis(n.dodge = 2)) +
theme_classic() +
labs(
title = "",
y = "proportion"
)
Let’s take a closer look of the line chart.
ggplot(newQ_plotdata %>%
filter(religion %in% c("1. Protestant", "2. Catholic [Roman Catholic]")),
aes(x = education, y = prop, group = religion, color = religion)) +
geom_point() +
geom_line() +
scale_x_discrete(labels = c("0-8 grades", "High school", "Some College", "College or higher"),
guide = guide_axis(n.dodge = 2)) +
theme_classic() +
labs(
title = "",
y = "proportion"
)
There is no obvious relationship between voters being Catholic and education, but we could find the obvious negative relationship between voters being a Protestant and the education background. People with weaker education background are tend to be a Protestant.
Q3_plotdata <- Q3_data %>%
filter(!(str_detect(occupation, "NA"))) %>%
group_by(year, occupation) %>%
count(vote) %>%
group_by(year, occupation) %>%
mutate(
prop = n/sum(n)
)
ggplot(Q3_plotdata,
aes(x = year, y = prop, fill = vote)) +
geom_bar(stat = "identity") +
scale_fill_manual(values=c("steelblue2", "red3")) +
facet_wrap(~ occupation, ncol = 1) +
theme_classic() +
labs(
title = "Relationship between votes' occupation with their choice of party voting",
y = "proportion"
)
By looking at the graph above, I find something interesting about the voters who are Laborers, who I expect having weaker educational background. In the previous finding, we know that people who have weaker educational background are tend to vote for Republican. Nevertheless, in this chart, Laborers are highly inclined to vote for Democrat. To validate my assumption of laborers having weaker educational background, let us construct another graph.
newQ_plotdata_2 <- newQ_data %>%
filter(!(str_detect(education, "NA"))) %>%
filter(!(str_detect(religion, "NA"))) %>%
filter(!(str_detect(occupation, "NA"))) %>%
group_by(occupation) %>%
count(education) %>%
group_by(occupation) %>%
mutate(
prop = n/sum(n)
)
ggplot(newQ_plotdata_2,
aes(x = occupation, y = prop, fill = education)) +
geom_bar(stat = "identity") +
scale_x_discrete(label = c("Professional", "Cler/Sale", "Skilled/Semi-Skilled", "Labor(no farmer)", "Farmers", "Homemakers"),
guide = guide_axis(n.dodge = 3)) +
theme_classic() +
labs(
title = "",
y = "proportion"
)
newQ_plotdata_2 %>%
filter(occupation == "4. Laborers, except farm") %>%
filter(education != "4. College or advanced degree (no cases 1948)") %>%
mutate(sum_prop = sum(prop))
As I expected, Laborers(except farm) are having weaker education background, and 98% of them are below college degree.
By starting with the question of if there exist a relationship between voters’ religious background and the party they would like to vote. We observed that in most of the time, the majority of voters who are Jewish are with the Democrat. At the same time, we would find a clearly upward trend of the proportions of voters who are Protestant or Catholic voting to Republican are increasing during the most three recent elections.
Then, we moved to the next question: if there exist any relationship between voters’ education background and the party they support. People who have college or advanced degree are more and more likely to vote for the Democrat in the past few decades. Also, we could find a upward trend of the proportion of voters voting to republican, who actually have weaker education background. Under deeper investigation, we indeed find the obvious negative relationship between voters being a Protestant and the education background. People with weaker education background are tend to be a Protestant and are likely to vote to Republican.
Nevertheless, in our last question of investigating the relationship between voters’ occupation and the party they support, we had a surprising finding. For laborers (except farm), who have weaker education background, according to our previous finding, we would expect that they like to support Republican. However, our data told us they actually like to vote for Democrat.
All of the above are the interesting findings I found through this data exploration journey.